A Pattern-Based Method for Document Structure Recognition
نویسندگان
چکیده
One of the main goals of the CIDRE1 project is the design of an interactive document recognition system able to improve with use. In previous work, members of this project have already described software architecture issues [1, 6] as well as font and logical structure recognition algorithms [8, 2]. In this paper, we present a new method for document structure recognition based on geometrical and logical layout analysis. Most traditional methods for layout analysis are restricted to simple document structures (see [4] for an overview of these methods). However, recent works show that complex layout analysis such as newspaper page decomposition have also to be considered [3] and, in particular, illustrate the needs of using learning-based algorithms [5] for these tasks. The method we propose in this paper is both adaptive and interactive. As we will see, this combination of learning techniques and user-interaction helps preventing repetitive errors and supports the handling of complex and previously unknown document structures. The paper is organized as follows. In Section 2, we describe our new method for document structure recognition. In Section 3, we describe two potential applications of this method for classification and segmentation of document structure objects. Section 4 gives some preliminary results concerning the evaluation of the method for classification tasks. Finally, we conclude with directions for future research.
منابع مشابه
Methodology for Validation of Issuance of Mystical and Ethical Narrations (A Case Study and Discourse Analysis on the Methodology of the Book Sirr ul-asra’)
The Book “the Secret of Prophet Mohammad’s Midnight Journey to the Seven Heavens in Explanation of Al-Mi’raj Hadith” is written by Ayatollah Sa’adatparvar. Analyzing the discourse of a part of its introduction, his recognition method about this hadith has been investigated in this paper. The paper aims at investigating the particular discourse pattern of the author in analyzing the document of ...
متن کاملNeural Network Based Recognition System Integrating Feature Extraction and Classification for English Handwritten
Handwriting recognition has been one of the active and challenging research areas in the field of image processing and pattern recognition. It has numerous applications that includes, reading aid for blind, bank cheques and conversion of any hand written document into structural text form. Neural Network (NN) with its inherent learning ability offers promising solutions for handwritten characte...
متن کاملA New Statistical Approach for Recognizing and Classifying Patterns of Control Charts (RESEARCH NOTE)
Control chart pattern (CCP) recognition techniques are widely used to identify the potential process problems in modern industries. Recently, artificial neural network (ANN) –based techniques are very popular to recognize CCPs. However, finding the suitable architecture of an ANN-based CCP recognizer and its training process are time consuming and tedious. In addition, because of the black box ...
متن کاملImage Pattern Recognition- Based Morphological Structure and Applications
AbstrAct This chapter describes a new pattern recognition method: pattern recognition-based morphological structure. First, smooth following and linearization are introduced based on difference chain codes. Second, morphological structural points are described in terms of smooth followed contours and linearized lines, and then the patterns of morphological structural points and their properties...
متن کاملA Micropower Current-Mode Euclidean Distance Calculator for Pattern Recognition
In this paper a new synthesis for circuit design of Euclidean distance calculation is presented. The circuit is implemented based on a simple two-quadrant squarer/divider block. The circuit that employs floating gate MOS (FG-MOS) transistors operating in weak inversion region, features low circuit complexity, low power (<20uW), low supply voltage (0.5V), two quadrant input current, wide dyn...
متن کامل